Empirical study and model of personal income 



Wataru Souma 1 and Makoto Nirei 2 

1 ATR Network Informatics Laboratories, Kyoto 619-0288, Japan, souma@atr.jp 

2 Utah State University, Logan, UT 84322, US. mnirei@econ.usu.edu 

Summary. Personal income distributions in Japan are analyzed empirically 
and a simple stochastic model of the income process is proposed. Based on 
empirical facts, we propose a minimal two-factor model. Our model of personal 
income consists of an asset accumulation process and a wage process. We 
show that these simple processes can successfully reproduce the empirical 
distribution of income. In particular, the model can reproduce the particular 
transition of the distribution shape from the middle part to the tail part. This 
model also allows us to derive the tail exponent of the distribution analytically. 
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1 Introduction 

Many economists and physicists have studied wealth and income. About one 
hundred years ago, Pareto found a power law distribution of wealth and in- 
come [13]. However, afterwards, Gibrat clarified that the power law is applica- 
ble to only the high wealth and income range, and the remaining part follows a 
lognormal distribution [7] . This characteristic of wealth and income was later 
rediscovered [2] [10] [16] [17]. Today, it is generally believed that high wealth 
and income follow a power law distribution. However, the remaining range of 
the distribution has not been settled. Recently an exponential distribution [5] 
and a Boltzmann distribution [20] has been proposed. 

To explain these characteristics of wealth and income, some mathematical 
models have been proposed. One of them is based on a stochastic multiplica- 
tive process (SMP). For example, the SMP with lower bound [9], the SMP 
with additive noise [15] [19], the SMP with wealth exchange [4], and the gen- 
eralized Lotka-Voltera model [3] [14]. 

This paper is organized as follows. In Sec. 2, we empirically study the 
personal income distribution in Japan. In Sec. 3, we propose a two- factor 
stochastic model to explain income distribution. The last section is devoted 
to a summary and discussion. 
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Fig. 1. A log- log plot of the distribution of employment income 1999 (left). A log- 
log plot of distributions in 1999 of self-assessed income, sum of employment income 
and self-assessed income, income tax data for top taxpayers, adjusted income tax 
data, and total income (right). 



2 Empirical study of the personal income distribution 

In this article we use three data sets. We call them employment income data, 
self-assessed income data, and income tax data for top taxpayers. The em- 
ployment income data is coarsely tabulated data for the distribution of wages 
in the private sector. This is reported by the National Tax Agency of Japan 
(NTAJ) [11]. This is composed of two kinds of data. One is for employment 
income earners who worked for less than a year, and we can acquire the data 
since 1951. For example, a log-log plot of the rank-size distribution of the data 
in 1999 is shown by the open circles in the left panel of Fig. 1. The other is 
for employment income earners who worked throughout the year, and we can 
acquire the data since 1950. For example, the distribution in 1999 is shown 
by the open squares in the left panel of Fig. 1. In this figure the crosses are 
the sum of these two data, and are almost the same as the distribution of 
employment income earners who worked throughout the year. 

The self-assessed income data is also reported by NTAJ. This is also 
coarsely tabulated data, and we can acquire this since 1887. The income tax 
law was changed many times, and so the characteristics of this data also 
changed many times. However, this data consistently contains high income 
earners. In Japan, in recent years, persons who have some income source, who 
earned more than 20 million yen, and who are not employees must declare 
their income. For example, the distribution in 1999 is shown by the open tri- 
angles in the right panel of Fig. 1. In this figure the filled circles are the sum 
of the employment income data and the self-assessed income data. However, 
we use only the self-assessed income data in the range greater than 20 mil- 
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lion yen. This is because persons who earned more than 20 million yen must 
declare their income, even if they are employees and have only one income 
source. This figure shows that the distribution of middle and low income is 
almost the same as that of the employment income. This means that the main 
income source of middle and low income earners is wages. 

In Japan, if the amount of one's income tax exceeds 10 million yen, the 
individual's name and the amount of income tax are made public by each 
tax office. Some data companies collect this and produce income tax data 
for top taxpayers. We obtained this data from 1987 to 2000. For example, the 
distribution in 1999 is shown by the open diamonds in the right panel of Fig. 1. 
To understand the whole image of distribution, we must convert income tax 
to income. We know from the self-assessed income data that the income of the 
40,623th person is 50 million yen,. On the other hand we also know from the 
income tax data for top taxpayers that the income tax of the 40,623th person 
is 13.984 million yen Hence, if we assume a linear relation between income and 
income tax, we can convert income tax to income by multiplying 3.5755 by 
the income tax [1]. The dots in Fig. 1 represent the distribution of converted 
income tax. This clearly shows the power law distribution in the high income 
range, and the particular transition of the distribution shape from the middle 
part to the tail part. 

2.1 Income sources 

Understanding income sources is important for the modeling of the income 
process. As we saw above, the main income source of middle and low income 
earners is wages. We can also see the income sources of high income earners 
from the report of NTAJ. The top panel of Fig. 2 shows a number of high 
income earners who earned income greater than 50 million yen in each year 
from 2000 to 2003. In this figure income sources are divided into the 14 cat- 
egories of business income, farm income, interest income, dividends, rental 
income, wages & salaries, comprehensive capital gains, sporadic income, mis- 
cellaneous income, forestry income, retirement income, short-term separate 
capital gains, long-term separate capital gains, and capital gains of stocks. 
The bottom panel of this figure shows the amount of income for each in- 
come source. These figures show that the main income sources of high income 
earners are wages and capital gains. 

2.2 Change of distribution 

The rank-size distribution of all acquired data is shown in the top panel of 
Fig. 3. The gap found in this figure reflects the change of the income tax law. 
We fit distributions in the high income range by the power law distribution, 
for which a probability density function is given by 



p(x) = Ax-^-V, 
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Fig. 2. Income sources of high income earners from 2000 to 2003. The top panel 
represents the number of high income earners , and the bottom panel represents the 
amount of income. In both panels, A: business income, B: farm income, C: inter- 
est income, D: dividends, E: rental income, F: wages & salaries, G: comprehensive 
capital gains, H: sporadic income, I: miscellaneous income, J: forestry income, K: re- 
tirement income, L: short-term separate capital gains, M: long-term separate capital 
gains, and N: capital gains of stocks. 

where A is a normalization factor. Here a is called the Pareto index. The 
small a corresponds to the unequal distribution. The change of a is shown by 
the open circles in the bottom panel of Fig. 3. The mean value of the Pareto 
index is a — 2, and a fluctuates around it. 

It is recognized that the period of modern economic growth in Japan is 
from the 1910s to the 1960s. It has been reported that the gross behavior of 
the Gini coefficient in this period looks like an inverted U-shape [18]. This 
behavior of the Gini coefficient is known as Kuznets's inverted U-shaped rela- 
tion between income inequality and economic growth [8] . This postulates that 
in the early stages of modern economic growth both a country's economic 
growth and its income inequality rises, and the Gini coefficient becomes large. 
For developed countries income inequality shows a tendency to narrow, and 
the Gini coefficient becomes small. Figure. 3 shows that the gross behavior of 
the Pareto index from the 1910s to the 1960s is almost the inverse of that of 
the Gini coefficient, i.e., U-shaped. This means that our analysis of the Pareto 
index also supports the validity of Kuznets's inverted U-shaped relation. 

We assume that the change of the Pareto index in the 1970s is respon- 
sible for the slowdown in the Japanese economic growth and the real estate 
boom. In Fig. 3 we can also see that a decreases toward the year 1990 and 




increases after 1990, i.e., V-shaped relation. In Japan, the year 1990 was the 
peak of the asset-inflation economic bubble. Hence the Pareto index decreases 
toward the peak of the bubble economy, and it increases after the burst of the 
economic bubble. The correlation between the Pareto index and risk assets is 
also clarified in Ref. [16]. 

We fit distributions in the low and middle income range by log-normal 
distribution, for which the probability density function is defined by 

= /^— 2 
IV Z7T<7 Z 

where xo is mean value and cr 2 is variance. Sometimes (3 = 1/V2a 2 is called 
the Gibrat index. Since the large variance means the global distribution of the 
income, the small (3 corresponds to unequal distribution. The change of /3 is 
shown by the crosses in the bottom panel of Fig. 3. This figure shows that a 
and (3 correlate with each other around the years 1960 and 1980. However, they 
have no correlation in the beginning of the 1970s and after 1985. Especially 
after 1985, (3 stays almost the same value. This means that the variance of the 
low and middle income distribution does not change. We assume that capital 
gains cause different behaviors of a and (3, and a is more sensitive to capital 
gains than (3. 



log (x/x ) 
2a 2 
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Fig. 4. A log-log plot of the cumulative distributions of normalized income from 
1987 to 2000 (left) and a semi-log plot of them (right). 

The top panel of Fig. 3 shows that the distribution moves to the right. 
This motivates us to normalize distributions by quantities that characterize 
the economic growth. Though many candidates exist, we simply normalize 
distributions by the average income. The left panel of Fig. 4 is a log-log plot 
of the cumulative distributions of normalized income from 1987 to 2000, and 
the right panel is a semi-log plot of them. These figures show that distri- 
butions almost become the same, except in the high income range. Though 
distributions in the high income range almost become the same, distributions 
of some years apparently deviate from the stational distribution. In addition 
the power law distribution is not applicable to such a case. This behavior 
happens in an asset-inflation economic bubble [6]. 

3 Modeling of personal income distribution 

The empirical facts found in the previous section are as follows. 

1. The distribution of high income earners follows the power law distribution, 
and the exponent, Pareto index, fluctuates around a = 2. 

2. The main income sources of high income earners are wages and capital 
gains. 

3. Excluding high income earners, the main income source is wages. 

4. The distribution normalized by the average income is regarded as the 
stational distribution. 

Hence, it is reasonable to regard income as the sum of wages and capital 
gains. However, to model capital gains, we must model the asset accumulation 
process. In the following we explain an outline of our model. Details of our 
model are found in Ref. [12]. 
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3.1 Wage process 

We denote the wages of the i-th person at time t as Wi(t), where i = 1 ~ N. 
We assume that the wage process is given by 

, . _ J uwi(t) + sei(t)w(t) if uwi(t) + sei(t)w(t) >w(t), 
w ^ t + l > - \ W (t) otherwise, {1 > 

where u is the trend growth of wage, and reflects an automatic growth in 
nominal wage. In this article we use u = 1.0422. This is an average inflation 
rate for the period from 1961 to 1999. In Eq. (1), a(t) follows a normal 
distribution with mean and variance 1, i.e., N(0, 1). In Eq. (1), s determines 
the level of income for the middle class. We choose s — 0.32 to fit the middle 
part of the empirical distribution. In Eq. (1), w(t) is the reflective lower bound, 
which is interpreted as a subsistence level of income. We assume that w(t) 
grows deterministically, 

W(t) = v'wJ(O). 

Here we use v = 1.0673. This is a time average growth rate of the nominal 
income per capita. 



3.2 Asset accumulation process 

We denote the asset of the i-th person at time t as a.j(i). We assume that the 
asset accumulation process is given by a multiplicative process, 

a t (t + 1) = 7i(t)ai(t) + Wi{t) - Ci(t), (2) 

where the log return, log ji(t), follows a normal distribution with mean y and 
variance x 2 , i.e., N{y 1 x 2 ). We use y = 0.0595. This is a time-average growth 
rate of the Nikkei average index from 1961 to 1999. We use x — 0.3122. This is 
a variance calculated from the distribution of the income growth rate for high 
income earners. In Eq. (2), we assume that a consumption function, Cj(i), is 
given by 

Ci(t) ^w(t)+b{a l {t) + w l {t)~w(t)}. 

In this article we chose b = 0.059 from the empirical range estimated from 
Japanese micro data. 



3.3 Income distribution derived from the model 

We denote the income of the i-th person at time t as Ii{t), and define it as 

Ii(t) = Wi (t) +E[ji(t) - l] ai (t). 

The results of the simulation for N = 10 6 are shown in Fig. 5. The left panel 
of Fig. 5 is a log-log plot of the cumulative distribution for income normalized 
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Fig. 5. A log-log plot of the cumulative distributions of normalized income in 1999 
and simulation results (left), and the Lorenz curve in 1999 and simulation results 
(right). 



by an average. The right panel of Fig. 5 is the simulation results for the Lorenz 
curve. These figures show that the accountability of our model is high. 

In our model, the exponent in the power law part of the distribution is 
derived from the asset accumulation process. From Eq. (1), we can analytically 
derive 

a = 1 _ 2108(1-^) + 

x z gx z 

where z is a steady state value of [w(t) — c(t)]/ (a(t)). Here (a(t)) is the average 
assets. In Eq. (3), g is a steady state value of the growth rate of (a(t)). 
Equation (3) shows that a fluctuates around a = 2, if 2z <~ gx 2 . 



4 Summary 

In this article we empirically studied income distribution, and constructed a 
model based on empirical facts. The simulation results of our model can ex- 
plain the real distribution. In addition, our model can explain the reason why 
the Pareto index fluctuate around a — 2. However there are many unknown 
facts. For example, we have no theory that can explain the income distribu- 
tion under the bubble economy, that can determine the functional form other 
than the high income range, and that can explain the shape of the income 
growth distribution, etc. 
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